Activating project at `~/Documents/github.com/ucla-biostat-257/2025spring/slides/22-juliaopt`
Status `~/Documents/github.com/ucla-biostat-257/2025spring/slides/22-juliaopt/Project.toml`
[1e616198] COSMO v0.8.9
[13f3f980] CairoMakie v0.13.5
[61c947e1] Clarabel v0.10.0
[f65535da] Convex v0.16.4
[31c24e10] Distributions v0.25.120
[5789e2e9] FileIO v1.17.0
⌅ [f6369f11] ForwardDiff v0.10.38
[2e9cd046] Gurobi v1.7.4
[b99e6be6] Hypatia v0.8.2
[916415d5] Images v0.26.2
[b6b21f68] Ipopt v1.10.3
[4076af6c] JuMP v1.25.0
[2ddba703] Juniper v0.9.3
[b8f27783] MathOptInterface v1.40.0
[1ec41992] MosekTools v0.15.9
[7f7a1694] Optimization v4.2.0
[fd9f6733] OptimizationMOI v0.5.3
[4e6fcdb7] OptimizationNLopt v0.3.2
[36348300] OptimizationOptimJL v0.4.3
[c946c3f1] SCS v2.1.0
[276daf66] SpecialFunctions v2.5.1
[10745b16] Statistics v1.11.1
[37e2e46d] LinearAlgebra v1.11.0
[9a3f8284] Random v1.11.0
[2f01184e] SparseArrays v1.11.0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`
1 Optimization in Julia
This lecture gives an overview of some optimization tools in Julia.
1.1 Flowchart
Statisticians do optimizations in daily life: maximum likelihood estimation, machine learning, …
Category of optimization problems:
Problems with analytical solutions: least squares, principle component analysis, canonical correlation analysis, …
Problems subject to Disciplined Convex Programming (DCP): linear programming (LP), quadratic programming (QP), second-order cone programming (SOCP), semidefinite programming (SDP), and geometric programming (GP).
Nonlinear programming (NLP): Newton type algorithms, Fisher scoring algorithm, EM algorithm, MM algorithms.
Large scale optimization: ADMM, SGD, …
Flowchart
1.2 Modeling tools and solvers
Getting familiar with good optimization softwares broadens the scope and scale of problems we are able to solve in statistics. Following table lists some of the best optimization softwares.
LP
MILP
SOCP
MISOCP
SDP
GP
NLP
MINLP
R
Matlab
Julia
Python
Cost
modeling tools
cvx
x
x
x
x
x
x
x
x
x
A
Convex.jl
x
x
x
x
x
x
O
Optimization.jl
x
x
x
x
x
x
x
O
JuMP.jl
x
x
x
x
x
x
x
O
MathOptInterface.jl
x
x
x
x
x
x
x
O
convex solvers
Mosek
x
x
x
x
x
x
x
x
x
x
x
A
Gurobi
x
x
x
x
x
x
x
x
A
CPLEX
x
x
x
x
x
x
x
x
A
SCS
x
x
x
x
x
x
O
COSMO.jl
x
x
x
x
O
Hypatia.jl (more cones)
x
x
x
x
O
NLP solvers
NLopt
x
x
x
x
x
x
O
Ipopt
x
x
x
x
x
x
O
KNITRO
x
x
x
x
x
x
x
x
$
O: open source, A: free academic license, $: commercial
Modeling tools such as cvx (for Matlab) and Convex.jl (Julia analog of cvx) implement the disciplined convex programming (DCP) paradigm proposed by Grant and Boyd (2008) http://stanford.edu/~boyd/papers/disc_cvx_prog.html. DCP prescribes a set of simple rules from which users can construct convex optimization problems easily.
Solvers (Mosek, Gurobi, Cplex, SCS, COSMO, Hypatia, …) are concrete software implementation of optimization algorithms. My favorite ones are: Mosek/Gurobi/SCS for DCP and Ipopt/NLopt for nonlinear programming. Mosek and Gurobi are commercial software but free for academic use. SCS/Ipopt/NLopt are open source.
Modeling tools usually have the capability to use a variety of solvers. But modeling tools are solver agnostic so users do not have to worry about specific solver interface.
If you want to install the commercial solvers Gurobi or Mosek, instructions are below:
Gurobi: 1. Download Gurobi at link. 2. Register an account and request free academic license at link. 3. Run grbgetkey XXXXXXXXX command on terminal as suggested. It’ll retrieve a license file and put it under a specified folder, e.g., /Users/huazhou/Documents/Gurobi/gurobi.lic. 4. Set up the environmental variables. On my machine, I put following two lines in the ~/.julia/config/startup.jl file: ENV["GUROBI_HOME"] = "/Library/gurobi1201/macos_universal2" and ENV["GRB_LICENSE_FILE"] = "/Users/huazhou/Documents/Gurobi/gurobi.lic".
Mosek: 1. Request free academic license at link. The license file will be sent to your edu email within minutes. Check Spam folder if necessary. 2. Put the license file at the default location ~/mosek/.
Install Julia packages Convex.jl, SCS.jl, Gurobi.jl, Mosek.jl, MathProgBase.jl, NLopt.jl, Ipopt.jl, which are open source.
1.3 DCP Using Convex.jl
Standard convex problem classes like LP (linear programming), QP (quadratic programming), SOCP (second-order cone programming), SDP (semidefinite programming), and GP (geometric programming), are becoming a technology.
DCP Hierarchy
1.3.1 Example: microbiome regression analysis
We illustrate optimization tools in Julia using microbiome analysis as an example.
16S microbiome sequencing techonology generates sequence counts of various organisms (OTUs, operational taxonomic units) in samples.
Microbiome Data
For statistical analysis, counts are normalized into proportions for each sample, resulting in a covariate matrix \(\mathbf{X}\) with all rows summing to 1. For identifiability, we need to add a sum-to-zero constraint to the regression cofficients. In other words, we need to solve a constrained least squares problem \[
\text{minimize} \frac{1}{2} \|\mathbf{y} - \mathbf{X} \beta\|_2^2
\] subject to the constraint \(\sum_{j=1}^p \beta_j = 0\). For simplicity we ignore intercept and non-OTU covariates in this presentation.
Let’s first generate an artifical data set.
usingRandom, LinearAlgebra, SparseArraysRandom.seed!(257) # seedn, p =100, 50X =rand(n, p)# scale each row of X sum to 1lmul!(Diagonal(1./vec(sum(X, dims=2))), X)# true β is a sparse vector with about 10% non-zero entriesβ =sprandn(p, 0.1) y = X * β +randn(n);
1.3.2 Sum-to-zero regression
The sum-to-zero contrained least squares is a standard quadratic programming (QP) problem so should be solved easily by any QP solver.
Suppose we want to know which organisms (OTU) are associated with the response. We can answer this question using a sum-to-zero contrained lasso \[
\text{minimize} \frac 12 \|\mathbf{y} - \mathbf{X} \beta\|_2^2 + \lambda \|\beta\|_1
\] subject to the constraint \(\sum_{j=1}^p \beta_j = 0\). Varying \(\lambda\) from small to large values will generate a solution path.
usingConvex# # Use Mosek solver# using Mosek# solver = Mosek.Optimizer# # Use Gurobi solver# using Gurobi# solver = Gurobi.Optimizer# # Use SCS solver# using SCS# solver = SCS.Optimizer# Use Hypatia solverusingHypatiasolver = Hypatia.Optimizer# # Use Clarabel solver# using Clarabel# solver = Clarabel.Optimizer# solve at a grid of λλgrid =0:0.01:0.35# holder for solution pathβ̂path =zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λ# optimization variableβ̂classo =Variable(size(X, 2))# obtain solution path using warm start@timefor i in1:length(λgrid) λ = λgrid[i]# define optimization problem# objective problem =minimize(0.5sumsquares(y - X * β̂classo) + λ *sum(abs, β̂classo))# constraint problem.constraints +=sum(β̂classo) ==0# constraintsolve!(problem, solver; silent =true) β̂path[i, :] = β̂classo.valueend
0.516054 seconds (1.37 M allocations: 219.045 MiB, 8.30% gc time, 7.80% compilation time: 100% of which was recompilation)
usingCairoMakief =Figure()Axis( f[1, 1], title ="Sum-to-Zero Lasso", xlabel = L"\lambda", ylabel = L"\beta_j")series!(λgrid, β̂path', color =:glasbey_category10_n256)f
1.3.4 Sum-to-zero group lasso
Suppose we want to do variable selection not at the OTU level, but at the Phylum level. OTUs are clustered into various Phyla. We can answer this question using a sum-to-zero contrained group lasso \[
\text{minimize} \frac 12 \|\mathbf{y} - \mathbf{X} \beta\|_2^2 + \lambda \sum_j \|\mathbf{\beta}_j\|_2
\] subject to the constraint \(\sum_{j=1}^p \beta_j = 0\), where \(\mathbf{\beta}_j\) are regression coefficients corresponding to the \(j\)-th phylum. This is a second-order cone programming (SOCP) problem readily modeled by Convex.jl.
Let’s assume each 10 contiguous OTUs belong to one Phylum.
# # Use Mosek solver# using Mosek# solver = Mosek.Optimizer# # Use Gurobi solver# using Gurobi# solver = Gurobi.Optimizer# # Use SCS solver# using SCS# solver = SCS.Optimizer# Use Hypatia solverusingHypatiasolver = Hypatia.Optimizer# # Use Clarabel solver# using Clarabel# solver = Clarabel.Optimizer# solve at a grid of λλgrid =0.0:0.005:0.5β̂pathgrp =zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λβ̂classo =Variable(size(X, 2))@timefor i in1:length(λgrid) λ = λgrid[i]# loss obj =0.5sumsquares(y - X * β̂classo)# group lasso penalty termfor j in1:(size(X, 2) ÷10) βj = β̂classo[(10(j -1) +1) :10j] obj = obj + λ *norm(βj)end problem =minimize(obj)# constraint problem.constraints = [sum(β̂classo) ==0] # constraintsolve!(problem, solver; silent =true) β̂pathgrp[i, :] = β̂classo.valueend
0.618763 seconds (2.12 M allocations: 438.603 MiB, 12.08% gc time)
We see it took Mosek <2 second to solve this seemingly hard optimization problem at 101 different \(\lambda\) values.
usingCairoMakief =Figure()Axis( f[1, 1], title ="Sum-to-Zero Group Lasso", xlabel = L"\lambda", ylabel = L"\beta_j")series!(λgrid, β̂pathgrp', color =:glasbey_category10_n256)f
1.3.5 Example: matrix completion
Load the \(128 \times 128\) Lena picture with missing pixels.
We fill out the missin pixels uisng a matrix completion technique developed by Candes and Tao \[
\text{minimize } \|\mathbf{X}\|_*
\]\[
\text{subject to } x_{ij} = y_{ij} \text{ for all observed entries } (i, j).
\] Here \(\|\mathbf{M}\|_* = \sum_i \sigma_i(\mathbf{M})\) is the nuclear norm. In words we seek the matrix with minimal nuclear norm that agrees with the observed entries. This is a semidefinite programming (SDP) problem readily modeled by Convex.jl.
This example takes longer because of high dimensionality. COSMO.jl seems to be the fastest solver for this problem. Other solvers take excessively long time.
# Use COSMO solver (fast)usingCOSMOsolver = COSMO.Optimizer# # Use Hypatia solver (slow)# using Hypatia# solver = Hypatia.Optimizer()# # Use Clarabel solver (slow)# using Clarabel# solver = Clarabel.Optimizer()# Linear indices of obs. entriesobsidx =findall(Y[:] .≠0.0)# Create optimization variablesX =Variable(size(Y))# Set up optmization problemproblem =minimize(nuclearnorm(X))problem.constraints += X[obsidx] == Y[obsidx]# Solve the problem by calling solve@timesolve!(problem, solver)
------------------------------------------------------------------
COSMO v0.8.9 - A Quadratic Objective Conic Solver
Michael Garstka
University of Oxford, 2017 - 2022
------------------------------------------------------------------
Problem: x ∈ R^{32897},
constraints: A ∈ R^{41025x32897} (41281 nnz),
matrix size to factor: 73922x73922,
Floating-point precision: Float64
Sets: DensePsdConeTriangle of dim: 32896 (256x256)
ZeroSet of dim: 8128
Nonnegatives of dim: 1
Settings: ϵ_abs = 1.0e-05, ϵ_rel = 1.0e-05,
ϵ_prim_inf = 1.0e-04, ϵ_dual_inf = 1.0e-04,
ρ = 0.1, σ = 1e-06, α = 1.6,
max_iter = 5000,
scaling iter = 10 (on),
check termination every 25 iter,
check infeasibility every 40 iter,
KKT system solver: QDLDL
Acc: Anderson Type2{QRDecomp},
Memory size = 15, RestartedMemory,
Safeguarded: true, tol: 2.0
Setup Time: 12945.24ms
Iter: Objective: Primal Res: Dual Res: Rho:
1 -1.4585e+03 1.5985e+01 5.9854e-01 1.0000e-01
25 1.4525e+02 5.1105e-02 1.1356e-03 1.0000e-01
50 1.4758e+02 1.1725e-02 1.4834e-03 6.8658e-01
75 1.4797e+02 5.5160e-04 4.7489e-05 6.8658e-01
100 1.4797e+02 1.7304e-05 1.3870e-06 6.8658e-01
------------------------------------------------------------------
>>> Results
Status: Solved
Iterations: 100
Optimal objective: 148
Runtime: 13.625s (13625.11ms)
13.862155 seconds (510.42 k allocations: 512.114 MiB, 0.34% gc time, 1.38% compilation time: 100% of which was recompilation)
Problem statistics
problem is DCP : true
number of variables : 1 (16_384 scalar elements)
number of constraints : 1 (8_128 scalar elements)
number of coefficients : 8_128
number of atoms : 3
Solution summary
termination status : OPTIMAL
primal status : FEASIBLE_POINT
dual status : FEASIBLE_POINT
objective value : 147.9711
Expression graph
minimize
└─ nuclearnorm (convex; positive)
└─ 128×128 real variable (id: 420…982)
subject to
└─ == constraint (affine)
└─ + (affine; real)
├─ index (affine; real)
│ └─ …
└─ 8128×1 Matrix{Float64}
usingImages# Result by nuclear norm minimizationcolorview(Gray, X.value)
We can also complete the matrix by the nonnegative matrix factorization (NMF) approach. The NMF problem is a non-convex optimization problem. \[
\text{minimize} \|P_{\Omega}(\mathbf{Y}) - P_{\Omega}(\mathbf{V}\mathbf{W})\|_F^2
\]\[
\text{subject to } \mathbf{V} \geq 0, \mathbf{W} \geq 0
\] where \(P_{\Omega}(\mathbf{Y})\) is the projection of \(\mathbf{Y}\) onto the observed entries \(\Omega\). The NMF problem is a non-convex optimization problem because it has multiple local optima. However, it can be solved using alternating least squares (ALS) or multiplicative update rules. The ALS algorithm is a popular method for solving the NMF problem. It alternates between fixing one variable and solving for the other, and vice versa.
functionmc_by_nnmf(Y, r; solver = SCS.Optimizer)# Linear indices of obs. entries obsidx =findall(Y[:] .≠0.0) yobs = Y[obsidx]# Create optimization variables V =Variable(size(Y, 1), r) W =Variable(r, size(Y, 2))# Set up optmization problem problem =minimize(norm(yobs - (V * W)[obsidx]), [V >=0, W >=0])# Initialize value of Wset_value!(W, rand(r, size(Y, 2)))# We'll do 10 iterations of alternating minimizationfor iter in1:20# Solve for V, with W fixedfix!(W)solve!(problem, solver; warmstart = iter >1 ? true:false, silent =true)free!(W)# Solve for W, with V fixedfix!(V)solve!(problem, solver; warmstart =true, silent =true)free!(V)end# Return the solutionreturn V.value, W.valueend
mc_by_nnmf (generic function with 1 method)
V, W =mc_by_nnmf(Y, 30; solver = Gurobi.Optimizer)
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-15
We use MLE of Gamma distribution to illustrate some rudiments of nonlinear programming (NLP) in Julia.
Let \(x_1,\ldots,x_m\) be a random sample from the gamma density \[
f(x) = \Gamma(\alpha)^{-1} \beta^{\alpha} x^{\alpha-1} e^{-\beta x}
\] on \((0,\infty)\). The loglikelihood function is \[
L(\alpha, \beta) = m [- \ln \Gamma(\alpha) + \alpha \ln \beta + (\alpha - 1)\overline{\ln x} - \beta \bar x],
\] where \(\overline{x} = \frac{1}{m} \sum_{i=1}^m x_i\) and \(\overline{\ln x} = \frac{1}{m} \sum_{i=1}^m \ln x_i\).
1.4.1 Define NLP optimization problem using Optimization.jl
WARNING: using Distributions.Categorical in module Main conflicts with an existing identifier.
WARNING: using Distributions.scale! in module Main conflicts with an existing identifier.
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
This is Ipopt version 3.14.17, running with linear solver MUMPS 5.7.3.
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 3
Total number of variables............................: 2
variables with only lower bounds: 2
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 0
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 1.3570624e+04 0.00e+00 9.90e+01 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0
1 4.2328683e+03 0.00e+00 3.32e+02 -1.0 1.93e+01 - 1.00e+00 5.12e-02f 1
2 4.0086073e+03 0.00e+00 1.51e+02 -1.0 8.38e-02 - 7.22e-01 1.00e+00f 1
3 3.7647589e+03 0.00e+00 7.75e+01 -1.0 2.59e-01 - 7.47e-01 1.00e+00f 1
4 3.5568603e+03 0.00e+00 3.56e+01 -1.0 4.46e-01 - 8.02e-01 1.00e+00f 1
5 3.4205063e+03 0.00e+00 1.56e+01 -1.0 6.28e-01 - 1.00e+00 1.00e+00f 1
6 3.3700715e+03 0.00e+00 4.48e+00 -1.0 6.41e-01 - 1.00e+00 1.00e+00f 1
7 3.3636757e+03 0.00e+00 8.00e-01 -1.0 4.07e-01 - 1.00e+00 1.00e+00f 1
8 3.3636188e+03 0.00e+00 1.22e-03 -1.7 2.01e-02 - 1.00e+00 1.00e+00f 1
9 3.3635946e+03 0.00e+00 9.18e-04 -3.8 2.68e-02 - 1.00e+00 1.00e+00f 1
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
10 3.3635946e+03 0.00e+00 4.86e-08 -5.7 1.94e-04 - 1.00e+00 1.00e+00f 1
11 3.3635946e+03 0.00e+00 7.90e-12 -8.6 2.47e-06 - 1.00e+00 1.00e+00f 1
Number of Iterations....: 11
(scaled) (unscaled)
Objective...............: 2.6757578428056711e+01 3.3635945815883747e+03
Dual infeasibility......: 7.9016926883909898e-12 9.9329207923314562e-10
Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00
Variable bound violation: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity.........: 2.5076199894927938e-09 3.1522348078017871e-07
Overall NLP error.......: 2.5076199894927938e-09 3.1522348078017871e-07
Number of objective function evaluations = 12
Number of objective gradient evaluations = 12
Number of equality constraint evaluations = 0
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 11
Total seconds in IPOPT = 0.650
EXIT: Optimal Solution Found.